File-Based Storage of Digital Objects and Constituent Datastreams: XMLtapes and Internet Archive ARC Files

نویسندگان

  • Xiaoming Liu
  • Lyudmila Balakireva
  • Patrick Hochstenbach
  • Herbert Van de Sompel
چکیده

This paper introduces the write-once/read-many XMLtape/ARC storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a protocol-based manner. First, XMLbased representations of multiple Digital Objects are concatenated into a single file named an XMLtape. An XMLtape is a valid XML file; its format definition is independent of the choice of the XML-based complex object format by which Digital Objects are represented. The creation of indexes for both the identifier and the creation datetime of the XMLbased representation of the Digital Objects facilitates OAI-PMH-based access to Digital Objects stored in an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects in a concatenated manner. An index for the identifier of the datastream facilitates OpenURL-based access to an ARC file. The interconnection between XMLtapes and ARC files is provided by conveying the identifiers of ARC files associated with an XMLtape as administrative information in the XMLtape, and by including OpenURL references to constituent datastreams of a Digital Object in the XML-based representation of that Digital Object.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : c s . D L / 0 50 30 16 v 2 3 J un 2 00 5 File - based storage of Digital Objects and constituent datastreams : XMLtapes and Internet Archive ARC files

This paper introduces the write-once/read-many XMLtape/ARC storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a protocol-based manner. First, XMLbased representations of multiple Digital Objects are concatenated into a single file named an XMLtape. An XMLtape is a valid XML fi...

متن کامل

aDORe: A Modular, Standards-Based Digital Object Repository

This paper describes the aDORe repository architecture designed and implemented for ingesting, storing, and accessing a vast collection of Digital Objects at the Research Library of the Los Alamos National Laboratory. The aDORe architecture is highly modular and standards-based. In the architecture, the MPEG-21 Digital Item Declaration Language is used as the XML-based format to represent Digit...

متن کامل

Migrating Content in WARC Files

Heritage institutions all over the world started on harvesting and preserving resources of the World Wide Web for future generations as part of our culture heritage. This task tends to be a non-trivial one because of two complex challenges: (1) crawling the enormous data amount located in the Internet and (2) performing long term preservation strategies on these data. Nowadays a lot of effort i...

متن کامل

Studies on the scalability of web preservation

This paper describes a mechanism for improving the scalability of preservation actions on large linked archives, such as WARC and ARC files produced from the archiving of web sites. To enable accurate but efficient preservation actions, information on the files embedded within a container object, such as the file formats of the embedded files, are aggregated and recorded as properties of the co...

متن کامل

Repository and Preservation Storage Architecture

While the Open Archive Information System (OAIS) model has become the de facto standard for preservation archives, the design and implementation of a repository or reliable long term archive lacks adopted technology standards and design best practices. This paper is intended to provide guidelines and recommendations for standards implementation and best practices for a viable, cost effective, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005